Representativity of First Authors in Psychology
A large proportion of first authors in psychology are located in North America or Europe, mostly in the US (Thalmayer et al., 2021, Arnett, 2008). In this dashboard, I present some aggregated data by continent, country, year, and journal (for first authors only).
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
The data from this report include information about publications from
from originally six psychology journals (Developmental
Psychology, Journal of Personality and Social Psychology,
Journal of Abnormal Psychology, Journal of Family
Psychology, Health Psychology, and Journal of
Educational Psychology) for years 1980 to 2023. The dashboard now
includes many more journals (see below for full list). They include
information about the articles (e.g., title, abstract) as well as on the
authors, such as university of affiliation. I have obtained these data
from PubMed using the PubMed API through the easyPubMed
package. I have determined the country of the first author of each paper
based on the affiliation address by matching the university name with a
world university names database obtained from GitHub.
The full PubMed query string was:
Developmental Psychology [Journal] OR Journal of Personality and Social Psychology [Journal] OR Journal of Abnormal Psychology [Journal] OR Journal of Family Psychology [Journal] OR Health Psychology [Journal] OR Journal of Educational Psychology [Journal] OR Journal of Experimental Social Psychology [Journal] OR Collabra. Psychology [Journal] OR Journal of Experimental Psychology. General [Journal] OR The Journal of Applied Psychology [Journal] OR Psychological Methods [Journal] OR Advances in Methods and Practices in Psychological Science [Journal] OR Psychological Science [Journal] OR Journal of Economic Psychology [Journal] OR Journal of Experimental and Behavioral Economics [Journal] OR Experimental Economics [Journal] OR Journal of Development Economics [Journal] OR World Development [Journal] OR Quarterly Journal of Economics [Journal] OR Econometrica [Journal] OR Behavioral Public Policy [Journal] OR Nature Human Behaviour [Journal] AND ('1980/01/01' [Date - Publication] : '2030/12/31' [Date - Publication])
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
Some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, (c) host DT tables on a server to speed up the website and allow the inclusion of a DT table for exploring the raw data, and (d) find a way to use country flags for the countries-by-journal figure.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
* Percentages are calculated after excluding missing values. The Missing row shows the real percentage of missing values.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
* Percentages are calculated after excluding missing values. The Missing column shows the real percentage of missing values.
---
title: "Neglected 95% Dashboard"
author: "Rémi Thériault"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
social: menu
source_code: embed
# theme: lumen
storyboard: false
---
```{r setup, include=FALSE}
# library(flexdashboard)
query_pubmed <- TRUE
```
```{r packages}
# Load packages
library(pubmedDashboard)
library(dplyr)
```
```{r API_TOKEN_PUBMED, eval=query_pubmed, include=FALSE}
if(Sys.info()["sysname"] == "Windows") {
API_TOKEN_PUBMED <- keyring::key_get("pubmed", "rempsyc")
}
check_pubmed_api_token(API_TOKEN_PUBMED)
```
```{r save_process_pubmed_batch, results='hide', eval=query_pubmed}
# We got a little problem with PLOS One, Science, and Nature,
# So we exclude them for now
save_process_pubmed_batch(
journal = head(journal_field$journal, -3),
year_low = 2023,
year_high = 2030,
api_key = API_TOKEN_PUBMED)
```
# Continent
## Column 1 {data-width=2150}
### Waffle plot of journal paper percentages, by continent (each square = 1% of data) {data-height=600}
```{r get_historic_data}
articles.df4 <- read_bind_all_data()
# We filter for year 1987 because there are almost no publications before that
# And remove european journal of health psychology because we did not
# include it within the search
articles.df4 <- articles.df4 %>%
filter(year >= 1987,
journal != "European journal of health psychology")
```
```{r continent_waffle_overall}
articles.df4 <- clean_journals_continents(articles.df4)
saveRDS(articles.df4, "data/fulldata.rds")
waffle_continent(articles.df4)
```
### Table of journal paper percentages, by continent {data-height=200}
```{r, continent_table}
table_continent(articles.df4)
```
## Column 2 {.tabset .tabset-fade}
### Context
**Representativity of First Authors in Psychology**
A large proportion of first authors in psychology are located in North America or Europe, mostly in the US ([Thalmayer et al., 2021](https://psycnet.apa.org/doi/10.1037/amp0000622), [Arnett, 2008](https://doi.org/10.1037/0003-066x.63.7.602)). In this dashboard, I present some aggregated data by continent, country, year, and journal (for first authors only).
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
### Method & Data
The data from this report include information about publications from from originally six psychology journals (*Developmental Psychology*, *Journal of Personality and Social Psychology*, *Journal of Abnormal Psychology*, *Journal of Family Psychology*, *Health Psychology*, and *Journal of Educational Psychology*) for years 1980 to 2023. The dashboard now includes many more journals (see below for full list). They include information about the articles (e.g., title, abstract) as well as on the authors, such as university of affiliation. I have obtained these data from PubMed using the PubMed API through the `easyPubMed` package. I have determined the country of the first author of each paper based on the affiliation address by matching the university name with a world university names database obtained from GitHub.
The full PubMed query string was:
`Developmental Psychology [Journal] OR Journal of Personality and Social Psychology [Journal] OR Journal of Abnormal Psychology [Journal] OR Journal of Family Psychology [Journal] OR Health Psychology [Journal] OR Journal of Educational Psychology [Journal] OR Journal of Experimental Social Psychology [Journal] OR Collabra. Psychology [Journal] OR Journal of Experimental Psychology. General [Journal] OR The Journal of Applied Psychology [Journal] OR Psychological Methods [Journal] OR Advances in Methods and Practices in Psychological Science [Journal] OR Psychological Science [Journal] OR Journal of Economic Psychology [Journal] OR Journal of Experimental and Behavioral Economics [Journal] OR Experimental Economics [Journal] OR Journal of Development Economics [Journal] OR World Development [Journal] OR Quarterly Journal of Economics [Journal] OR Econometrica [Journal] OR Behavioral Public Policy [Journal] OR Nature Human Behaviour [Journal] AND ('1980/01/01' [Date - Publication] : '2030/12/31' [Date - Publication])`
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
### Missing data
Some of the papers were missing address information; in many cases, the PubMed API provided only the department and no university. It was not possible to identify the country in these cases (one would need to look at the actual papers one by one to make manual corrections). Furthermore, some university names from the data did not match the university name database obtained from GitHub. In some cases, I have brought manual corrections to university names in an attempt to reduce the number of missing values.
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
### Next Steps
Possible future steps include: (a) obtaining a better, more current university name database (that includes country of university), (b) making manual corrections for other research institutes not included in the university database, (c) host DT tables on a server to speed up the website and allow the inclusion of a DT table for exploring the raw data, and (d) find a way to use country flags for the countries-by-journal figure.
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
# Continent, by Year
## Column 1 {data-width=800}
### Scatter plot of journal paper percentages, by continent and year {data-height=600}
```{r, continent_scatter_overall}
scatter_continent_year(articles.df4, method = "loess")
```
## Column 2
### Table of journal paper percentages, by continent {data-height=200}
```{r, continent_table_journal_year}
table_continent_year(articles.df4)
```
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
# Continent, by Journal
## Column 1 {data-width=700}
### Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data) {data-height=600}
```{r continent_table_journal_figure}
waffle_continent_journal(articles.df4)
```
## Column 2
### Table of journal paper percentages, by continent and journal {data-height=200}
```{r continent_table_journal}
table_continent_journal(articles.df4)
```
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
# Country
## Column 1 {data-width=800}
### Waffle plot of journal paper percentages, by country (each flag = 1% of data)
```{r country_table_overall, fig.width=4.5, fig.height=4.5}
waffle_country(articles.df4)
```
## Column 2
### Table of journal paper percentages, by country {data-height=200}
```{r country_table_journal}
table_country(articles.df4)
```
> \* Percentages are calculated after excluding missing values. The *Missing* row shows the real percentage of missing values.
# Country, by Year
## Column 1 {data-width=1000}
### Scatter plot of journal paper percentages, by country and year
```{r, country_series_year}
scatter_country_year(articles.df4, method = "lm")
# Include flags on scatter plot
# Note: doesn't work with ggplotly it seems
# library(ggflags)
# df.country.year %>%
# mutate(year = as.numeric(year),
# country = countrycode(country, "country.name", "genc2c"),
# country = tolower(country),
# country = as.factor(country)) %>%
# nice_scatter(
# predictor = "year",
# response = "percentage",
# group = "country",
# colours = colors,
# method = "lm",
# groups.order = "decreasing",
# ytitle = "% of All Papers") +
# geom_flag(aes(country = country)) +#%>%
# scale_country(aes(country = country)) #%>%
#ggplotly(tooltip = c("x", "y"))
# Time series dygraph
# dygraph_year(articles.df4)
# dygraph_year(articles.df4, "country")
```
## Column 2
### Table of journal paper percentages, by country and year {data-height=200}
```{r, country_table_year}
table_country_year(articles.df4)
```
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
# Country, by Journal
## Column 1 {data-width=800}
### Waffle plot of journal paper percentages, by continent and journal (each square = 1% of data) {data-height=600}
```{r country_table_journal_figure}
# df.country.journal.missing <- articles.df4 %>%
# filter(!is.na(country)) %>%
# count(journal, country, name = "Papers") %>%
# arrange(desc(journal), desc(Papers))
#
# get_journal_papers <- function(journal, country) {
# df.country.journal.missing[which(
# df.country.journal.missing$journal == journal &
# df.country.journal.missing$country == country), "Papers"]
# }
waffle_country_journal(articles.df4)
```
## Column 2
### Table of journal paper percentages, by country and journal {data-height=200}
```{r country_table_journal2}
table_country_journal(articles.df4)
```
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
# Psychology
## Column 1 {data-width=800}
### Scatter plot of journal paper percentages, by continent and year {data-height=600}
```{r, continent_scatter_overall_psychology}
articles.df4 %>%
filter(field == "psychology") %>%
scatter_continent_year(method = "loess")
```
## Column 2
### Table of journal paper percentages, by continent {data-height=200}
```{r, continent_table_journal_year_psychology}
articles.df4 %>%
filter(field == "psychology") %>%
table_continent_year()
```
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
# Economics
## Column 1 {data-width=800}
### Scatter plot of journal paper percentages, by continent and year {data-height=600}
```{r, continent_scatter_overall_economics}
articles.df4 %>%
filter(field == "economics") %>%
scatter_continent_year(method = "loess")
```
## Column 2
### Table of journal paper percentages, by continent {data-height=200}
```{r, continent_table_journal_year_economics}
articles.df4 %>%
filter(field == "economics") %>%
table_continent_year()
```
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
# General
## Column 1 {data-width=800}
### Scatter plot of journal paper percentages, by continent and year {data-height=600}
```{r, continent_scatter_overall_general}
articles.df4 %>%
filter(field == "general") %>%
scatter_continent_year(method = "loess")
```
## Column 2
### Table of journal paper percentages, by continent {data-height=200}
```{r, continent_table_journal_year_general}
articles.df4 %>%
filter(field == "general") %>%
table_continent_year()
```
> \* Percentages are calculated after excluding missing values. The *Missing* column shows the real percentage of missing values.
# Figure 1
## Column 1
### Figure 1, Proportion of American First Authors 1987-Today
```{r, fig1}
scatter_figure1(articles.df4, original = TRUE)
```
# Missing Data
## Column 1 {data-width=700}
### This table allows investigating why the country/university could not be identified
```{r missing_universities, warning=FALSE}
table_missing_country(articles.df4)
```
## Column 2